在嘈杂的互联网规模数据集上进行了预测,已对具有广泛的文本,图像和其他模式能力的培训模型进行了大量研究。但是,对于许多顺序决策域,例如机器人技术,视频游戏和计算机使用,公开可用的数据不包含以相同方式训练行为先验所需的标签。我们通过半监督的模仿学习将互联网规模的预处理扩展到顺序的决策域,其中代理通过观看在线未标记的视频来学习行动。具体而言,我们表明,使用少量标记的数据,我们可以训练一个足够准确的反向动力学模型,可以标记一个巨大的未标记在线数据来源 - 在这里,在线播放Minecraft的在线视频 - 然后我们可以从中训练一般行为先验。尽管使用了本地人类界面(鼠标和键盘为20Hz),但我们表明,这种行为先验具有非平凡的零射击功能,并且可以通过模仿学习和加强学习,可以对其进行微调,以进行硬探索任务。不可能通过增强学习从头开始学习。对于许多任务,我们的模型都表现出人类水平的性能,我们是第一个报告可以制作钻石工具的计算机代理,这些工具可以花费超过20分钟(24,000个环境动作)的游戏玩法来实现。
translated by 谷歌翻译
Unsupervised source-free domain adaptation methods aim to train a model to be used in the target domain utilizing the pretrained source-domain model and unlabeled target-domain data, where the source data may not be accessible due to intellectual property or privacy issues. These methods frequently utilize self-training with pseudo-labeling thresholded by prediction confidence. In a source-free scenario, only supervision comes from target data, and thresholding limits the contribution of the self-training. In this study, we utilize self-training with a mean-teacher approach. The student network is trained with all predictions of the teacher network. Instead of thresholding the predictions, the gradients calculated from the pseudo-labels are weighted based on the reliability of the teacher's predictions. We propose a novel method that uses proxy-based metric learning to estimate reliability. We train a metric network on the encoder features of the teacher network. Since the teacher is updated with the moving average, the encoder feature space is slowly changing. Therefore, the metric network can be updated in training time, which enables end-to-end training. We also propose a metric-based online ClassMix method to augment the input of the student network where the patches to be mixed are decided based on the metric reliability. We evaluated our method in synthetic-to-real and cross-city scenarios. The benchmarks show that our method significantly outperforms the existing state-of-the-art methods.
translated by 谷歌翻译
The global Information and Communications Technology (ICT) supply chain is a complex network consisting of all types of participants. It is often formulated as a Social Network to discuss the supply chain network's relations, properties, and development in supply chain management. Information sharing plays a crucial role in improving the efficiency of the supply chain, and datasheets are the most common data format to describe e-component commodities in the ICT supply chain because of human readability. However, with the surging number of electronic documents, it has been far beyond the capacity of human readers, and it is also challenging to process tabular data automatically because of the complex table structures and heterogeneous layouts. Table Structure Recognition (TSR) aims to represent tables with complex structures in a machine-interpretable format so that the tabular data can be processed automatically. In this paper, we formulate TSR as an object detection problem and propose to generate an intuitive representation of a complex table structure to enable structuring of the tabular data related to the commodities. To cope with border-less and small layouts, we propose a cost-sensitive loss function by considering the detection difficulty of each class. Besides, we propose a novel anchor generation method using the character of tables that columns in a table should share an identical height, and rows in a table should share the same width. We implement our proposed method based on Faster-RCNN and achieve 94.79% on mean Average Precision (AP), and consistently improve more than 1.5% AP for different benchmark models.
translated by 谷歌翻译